Day 17 - Regular expressions - Groups

57

$ cat examples.txt | grep -oP "[A-Za-z ]+(?=[0-9]+)"

Police

H

R

D

Johnny

Cyborg

The immediate evolution of the positive lookahead is the negative lookahead, which is expressed by

?!. You can start to see a pattern here (apt, speaking of regular expressions). Lookaround groups are

introduced by a ? and followed by a criteria, which can be equality (=) or inequality (!).

You can also use lookbehind expressions, which start with a ?< instead of starting with a simple ?.

These expressions match patterns that follow the lookbehind group, for example

$ cat examples.txt | grep -oP "(?<=[A-Z])[a-z]+"

ug

og

olice

ohnny

pider

an

yborg

ig

ad

olf

ony

ictures

That matches all the lowercase letters that follow an uppercase one, without including the latter.

The lookbehind expression is (?<=[A-Z]).

A warning: lookaround expressions are often difficult to manage, and their behaviour can be

surprising because it strongly depends on the implementation of the engine. You won’t hit such

complex cases now that you just learned how to use groups, but it might happen in the future. For

the time being, please keep in mind that there are important things to learn about regular expression

engines, such as if they are greedy or not. This book wants to be a primer, so I will simply pretend

those issues do not exist, but remember that there is a lot to learn out there!

Back-references are actually supported by grep, but their behaviour can be surprising. The code

$ cat examples.txt | grep -E "[A-Z]([0-9])-[A-Z]\1"

R2-D2